NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

KV cache is 1 bit per channel: efficient large language model inference with coupled quantization

Zhang, Tianyi; Yi, Jonah; Xu, Zhaozhuo; Shrivastava, Anshumali (June 2025, Curran Associates Inc.)

Free, publicly-accessible full text available June 5, 2026
NoMAD-attention: efficient LLM inference on CPUs through multiply-add-free attention

Zhang, Tianyi; Yi, Jonah; Yao, Bowen; Xu, Zhaozhuo; Shrivastava, Anshumali (June 2025, Curran Associates Inc.)

Free, publicly-accessible full text available June 5, 2026
ZEN: Empowering Distributed Training with Sparsity-driven Data Synchronization

Wang, Zhuang; Xu, Zhaozhuo; Xi, Jingyi; Wang, Yuke; Shrivastava, Anshumali; Ng, T_S_Eugene (June 2025, 19th USENIX Symposium on Operating Systems Design and Implementation)

Free, publicly-accessible full text available June 7, 2026
SS1: accelerating inference with fast and expressive sketch structured transform

Saedi, Kimia; Desai, Aditya; Walia, Apoorv; Lee, Jihyeong; Zhou, Keren; Shrivastava, Anshumali (June 2025, Curran Associates Inc.)

Free, publicly-accessible full text available June 5, 2026
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Zhang, Tianyi; Shrivastava, Anshumali (January 2025, openreview.net)

Free, publicly-accessible full text available January 22, 2026
Learning Scalable Structural Representations for Link Prediction with Bloom Signatures

https://doi.org/10.1145/3589334.3645672

Zhang, Tianyi; Yin, Haoteng; Wei, Rongzhe; Li, Pan; Shrivastava, Anshumali (May 2024, The Web Conference 2024)

Graph neural networks (GNNs) have shown great potential in learning on graphs, but they are known to perform sub-optimally on link prediction tasks. Existing GNNs are primarily designed to learn node-wise representations and usually fail to capture pairwise relations between target nodes, which proves to be crucial for link prediction. Recent works resort to learning more expressive edge-wise representations by enhancing vanilla GNNs with structural features such as labeling tricks and link prediction heuristics, but they suffer from high computational overhead and limited scalability. To tackle this issue, we propose to learn structural link representations by augmenting the message-passing framework of GNNs with Bloom signatures. Bloom signatures are hashing-based compact encodings of node neighborhoods, which can be efficiently merged to recover various types of edge-wise structural features. We further show that any type of neighborhood overlap-based heuristic can be estimated by a neural network that takes Bloom signatures as input. GNNs with Bloom signatures are provably more expressive than vanilla GNNs and also more scalable than existing edge-wise models. Experimental results on five standard link prediction benchmarks show that our proposed model achieves comparable or better performance than existing edge-wise GNN models while being 3-200 × faster and more memory-efficient for online inference.
more » « less
In defense of parameter sharing for model-compression

Desai, Aditya; Shrivastava, Anshumali (January 2024, OpenReview.net)

Full Text Available
Soft prompt recovers compressed LLMs, transferably

Xu, Zhaozhuo; Liu, Ziru; Chen, Beidi; Zhong, Shaochen; Tang, Yuxin; Wang, Jue; Zhou, Kaixiong; Hu, Xia; Shrivastava, Anshumali (July 2024, JMLR.org)

Full Text Available
Hardware-aware Compression with Random Operation Access Specific Tile (ROAST) Hashing

Desai, Aditya; Zhou, Keren; Shrivastava, Anshumali (January 2023, International Conference on Machine Learning 2023 (ICML 2023))

Full Text Available
Learning to Retrieve Relevant Experiences for Motion Planning

https://doi.org/10.1109/ICRA46639.2022.9812076

Chamzas, Constantinos; Cullen, Aedan; Shrivastava, Anshumali; Kavraki, Lydia E. (May 2022, 2022 International Conference on Robotics and Automation)

Recent work has demonstrated that motion planners’ performance can be significantly improved by retrieving past experiences from a database. Typically, the experience database is queried for past similar problems using a similarity function defined over the motion planning problems. However, to date, most works rely on simple hand-crafted similarity functions and fail to generalize outside their corresponding training dataset. To address this limitation, we propose (FIRE), a framework that extracts local representation of planning problems and learns a similarity function over them. To generate the training data we introduce a novel self-supervised method that identifies similar and dissimilar pairs of local primitives from past solution paths. With these pairs, a Siamese network is trained with the contrastive loss and the similarity function is realized in the network’s latent space. We evaluate FIRE on an 8-DOF manipulator in five categories of motion planning problems with sensed environments. Our experiments show that FIRE retrieves relevant experiences which can informatively guide sampling-based planners even in problems outside its training distribution, outperforming other baselines.
more » « less
Full Text Available

« Prev Next »

Search for: All records